Computational saliency models for still images have gained significantpopularity in recent years. Saliency prediction from videos, on the other hand,has received relatively little interest from the community. Motivated by this,in this work, we study the use of deep learning for dynamic saliency predictionand propose the so-called spatio-temporal saliency networks. The key to ourmodels is the architecture of two-stream networks where we investigatedifferent fusion mechanisms to integrate spatial and temporal information. Weevaluate our models on the DIEM and UCF-Sports datasets and present highlycompetitive results against the existing state-of-the-art models. We also carryout some experiments on a number of still images from the MIT300 dataset byexploiting the optical flow maps predicted from these images. Our results showthat considering inherent motion information in this way can be helpful forstatic saliency estimation.
展开▼